Robust dynamic classes revealed by measuring the response function of a social system 
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We study the relaxation response of a social system after endogenous and exogenous bursts of 
activity using the time-series of daily views for nearly 5 million videos on YouTube. We find that 
most activity can be described accurately as a Poisson process. However, we also find hundreds of 
thousands of examples in which a burst of activity is followed by an ubiquitous power-law relaxation 
governing the timing of views. We find that these relaxation exponents cluster into three distinct 
classes, and allow for the classification of collective human dynamics. This is consistent with an 
epidemic model on a social network containing two ingredients: A power law distribution of waiting 
times between cause and action and an epidemic cascade of actions becoming the cause of future 
actions. This model is a conceptual extension of the fluctuation-dissipation theorem [2, 0| to social 
systems, and provides a unique framework for the investigation of timing in complex systems. 
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INTRODUCTION 

Uncovering rules governing collective human behavior 
is a difficult task because of the myriad of factors that 
influence an individual's decision to take action. Investi- 
gations into the timing of individual activity, as a basis 
for understanding more complex collective behavior, have 
reported statistical evidence that human actions range 
from random Q to highly correlated While most of 
the time the aggregated dynamics of our individual activ- 
ities create seasonal trends or simple patterns, sometimes 
our collective action results in blockbusters, best-sellers, 
and other large-scale trends in financial and cultural mar- 
kets. 

Here, we attempt to understand this nontrivial herd- 
ing by investigating how the distribution of waiting times 
describing individuals' activity Q is modified by the com- 
bination of interactions Q and external influences in a so- 
cial network. This is achieved by measuring the response 
function of a social system Q and distinguishing whether 
a burst of activity was the result of a cumulative effect 
of small endogenous factors or instead the response to a 
large exogenous perturbation. Looking for endogenous 
and exogenous signatures in complex systems provides a 
useful framework for understanding many complex sys- 
tems and has been successfully applied in several other 
contexts 0. 

As an illustration of this distinction in a social sys- 
tem, consider the example of trends in queries on internet 
search engines Q in figure [TJ which shows the remarkable 
differences in the dynamic response of a social network 
to major social events. For the "exogenous" catastrophic 
Asian tsunami of December 26th, 2004, we see the so- 
cial network responded suddenly. In contrast, the search 
activity surrounding the release of a Harry Potter movie 
has the more "endogenous" signature generated by word- 
of-mouth, with significant precursory growth and an al- 
most symmetric decay of interest after the release. In 
both "endo" and "exo" cases there is a significant burst 



of activity. However, we expect to be able to distinguish 
the post-peak relaxation dynamics on account of the very 
different processes that resulted in the bursts. Further- 
more, we expect the relaxation process to depend on the 
interest of the population since this will influence the ease 
with which the activity can be spread from generation to 
generation. 
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FIG. 1: (color online) Search queries as a proxy for collec- 
tive human attention. (A) The volume of searches for the 
word "tsunami" in the aftermath of the catastrophic asian 
tsunami. The sudden peak and slow relaxation illustrate the 
typical signature of an "exogenous" burst of activity. (B) 
The volume of search queries for "Harry Potter movie" . The 
significant growth preceding the release of the film and sym- 
metric relaxation is characteristic of an "endogenous" burst 
of activity. 



To translate this qualitative distinction into quantita- 
tive results, we describe a model of epidemic spreading 
on a social network § and validate it with a data set that 
is naturally structured to facilitate the separation of this 
endo/exo dichotomy. Our data consists of nearly 5 mil- 
lion time-series of human activity collected sub-daily over 
8 months from the 4th most visited web site (YouTube) . 
At the simplest level, viewing activity can occur one of 
three ways: randomly, exogenously (when a video is fea- 
tured), or endogenously (when a video is shared). This 
provides us with a natural laboratory for distinguishing 
the effects that various impacts have and allows us to 
measure the social "response function" . 
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THE MODEL 



Predictions of the model 



Various factors may lead to viewing a video, which in- 
clude chance, triggering from email, linking from external 
websites, discussion on blogs, newspapers, and television, 
and from social influences. The epidemic model we ap- 
ply to the dynamics of viewing behavior on YouTube uses 
two ingredients whose interplay capture these effects. 

The first ingredient is a power law distribution of wait- 
ing times describing human activity [3, H, that ex- 
presses the latent impact of these various factors using 
a response function which, on the basis of previous work 
fll . 12, we take to be a long- memory process of the 
form 



4>(t) ~ i/t 1 - 



with < 9 < 1 



(1) 



By definition, the memory kernel <f>(t) describes the dis- 
tribution of waiting times between "cause" and "action" 
for an individual. The "cause" can be any of the above 
mentioned factors. The action is for the individual to 
view the video in question after a time t since she was 
first subjected to the "cause" without any other influ- 
ences between and t, corresponding to a direct (or 
first-generation) effect. In other words, </>(£) is the "bare" 
memory kernel or propagator, describing the direct influ- 
ence of a factor that triggers the individual to view the 
video in question. Here, the exponent 9 is the key param- 
eter of the theory which will be determined empirically 
from the data. 

The second ingredient is an epidemic branching pro- 
cess that describes the cascade of influences on the social 
network. This process captures how previous attention 
from one individual can spread to others and become 
the cause that triggers their future attention 14| . In a 
highly connected network of individuals whose interests 
make them susceptible to the given video content, a given 
factor may trigger action through a cascade of intermedi- 
ate steps. Such an epidemic process can be conveniently 
modeled by the so-called self-excited Hawkes conditional 
Poisson process [l5| . This gives the instantaneous rate 
of views X(t) as 



According to our model, the aggregated dynamics can 
be classified by a combination of the type of disturbance 
(endo/exo) and the ability of individuals to influence oth- 
ers to action (critical/sub-critical), all of which is linked 
by a common value of 9. The following classification of 
behaviors emerges from the interplay of the bare long- 
memory kernel 4>(t) given by |T]) and the epidemic influ- 
ences across the network modeled by the Hawkes process 

• Exogenous sub-critical. When the network is 
not "ripe" (that is, when connectivity and spread- 
ing propensity are relatively small), corresponding 
to the case when the mean value (fa) of fa is less 
than 1, then the activity generated by an exogenous 
event at time t c does not cascade beyond the first 
few generations, and the activity is proportional to 
the direct (or "bare") memory function (f>(t — t c ): 
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(3) 



• Exogenous critical. If instead the network is 
"ripe" for a particular video, i.e., (fa) is close to 
1, then the bare response is renormalized as the 
spreading is propagated through many generations 
of viewers influencing viewers influencing viewers, 
and the theory predicts the activity to be described 
by i: 
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• Endogenous critical. If in addition to being 
"ripe" , the burst of activity is not the result of an 
exogenous event, but is instead fueled by endoge- 
nous (word-of-mouth) growth, the bare response is 
renormalized giving the following time-dependence 
for the view count before and after the peak of ac- 
tivity [{|: 



=(*) 
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(5) 



A(t) = V(t) + - U) 



(2) 



where fa is the number of potential viewers who will be 
influenced directly over all future times after ti by per- 
son i who viewed a video at time ti. Thus, the existence 
of well-connected individuals can be accounted for with 
large values of fa. Lastly, V(t) is the exogenous source, 
which captures all spontaneous views that are not trig- 
gered by epidemic effects on the network. 



• Endogenous sub-critical. Here the response is 
largely driven by fluctuations, and not bursts of 
activity. We expect that many time-series in this 
class will obey a simple stochastic process. 



(0) 



The dynamics described by the above classifications 
are illustrated in figure[2l In addition to these classes, the 
model predicts, by construction, a relationship between 
the fraction of views obtained on the peak day compared 
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to the total cumulative views (figure [2j inset). For the 
exogenous sub-critical class, the absence of precursory 
growth and fast relaxation following a peak imply that 
close to 100% of the views are contained in the peak. 
For the exogenous critical class, the fractional views 
in the peak should be smaller than the previous case on 
account of the content penetrating the network result- 
ing in a slower relaxation. Finally, for the endogenous 
critical class, significant precursory growth followed by a 
slow decay imply that the fractional weight of this peak 
is very small compared to the total view count. This 
simple observation will be used as the basis for grouping 
exponents. 



Sub - Critical Critical 
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FIG. 2: A schematic view of the 4 categories described by 
our models: (A) Endo-subcritical (B) Endo-critical (C) Exo- 
subcritical (D) Exo-critical. The theory predicts the slope 
of the response function conditioned on the class of the dis- 
turbance (endo/exo) and the susceptibility of the network 
(critical/sub-critical). Also shown schematically in the pie 
chart is the fraction of views contained in the main peak rel- 
ative to the total number of views for each category. This is 
used as a basis for sorting the time-series into three distinct 
groups for further analysis of the exponents. 



RESULTS 

We find that most videos' dynamics (~ 90%) cither 
do not experience much activity or can be statistically 
described as a random process (using a Poisson process 
and Chi-Squared test). This is not inconsistent with 
the endo-subcritical classification. For the other 10% 
(« 500, 000 videos) we find nontrivial herding behavior 
which accurately obeys the three power-law relations de- 
scribed above. Characteristic examples of endogenous 
and exogenous dynamics are shown in figure [3] 

For these videos that experience bursts, we calculate 
the fraction F of views on the most active day compared 
to the total count, and define three classes: 

1. Class 1 is defined by 80% < F < 100%. 
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FIG. 3: (color online) An illustration of the typical response 
found in hundreds of thousands of time-series on YouTube. 

(A) The endogenous-critical class is marked by significant pre- 
cursory growth followed by an almost symmetric relaxation. 

(B) The exogenous-critical class is marked by a sudden burst 
of activity followed by a power-law relaxation. Inset shows the 
post-peak relaxation on log-log scale, revealing the power-law 
behavior that lasts over months for both classes. 

2. Class 2 is defined by 20% < F < 80%. 

3. Class 3 is defined by 0% < F < 20%. 

Should our model have any informative power, we 
should find a correspondence 

Class 1 < — > Exogenous sub — critical 
Class 2 < — > Exogenous critical 
Class 3 < — ► Endogenous critical 

Based on this simple classification, we show in figure 2] 
the histogram of exponents characterizing the power law 
relaxation ~ l/t p . The exponents in the various classes 
cluster into groups with the most probable exponent in 
each class given respectively by p w 1.4, 0.6, and 0.2. 
These results are robust with respect to changes of the 
threshold percentages. These values are compatible with 
the predictions ([3]), ([5]) of the epidemic model with 
a unique value of 6 = 0.4 ± 0.1. 

Having empirically extracted a value of 9, a further 
test of the model is provided by asking if the dynam- 
ics of videos with these exponents are consistent with 
the description of the model. Here, the test of the epi- 
demic model lies in the precursory dynamics. We check 
this by performing a peak-centered, aggregate sum for all 
videos with exponents near either 1.4, 0.6, or 0.2, with 
the intent of visualizing the time evolution. Each time- 
series is first normalized to 1 to avoid a single video from 
dominating the sum, and the final result is divided by 
the number of videos in each set so we can compare the 
three classes. The model predicts, and we indeed observe 
in figure [3 that videos whose post-peak dynamics is gov- 
erned by small exponents have significantly more precur- 
sory growth. One also sees very little precursory growth 
for the two exogenous classes. Since by construction our 
grouping was based on the exponent characterizing the 
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FIG. 4: (color online) Histogram of the exponents p of 
the power law relaxation ~ l/t p of the view counts follow- 
ing a peak belonging respectively to Class 1 (dashed green 
line), Class 2 (dotted blue line) and Class 3 (continuous red 
line). The predicted values for the exogenous sub-critical class 
(equation ([3])), exogenous critical class (equation Q) and for 
the endogenous critical class (equation ©) are shown by the 
vertical dashed lines with their quantitative values determined 
with the choice 9 = 0.4. 
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FIG. 5: (color online) Test of the precursory dynamics. The 
(endo/exo) and (critical/subcritical) classification is based on 
measuring the exponent governing the relaxation after a peak. 
However, the epidemic branching model also predicts signifi- 
cant precursory growth before a peak for the endogenous class 
with p centered on 1 — 26 ~ 0.2 and no precursory growth 
for the two exogenous classes. The figure shows the peak- 
centered, aggregate sum for all videos with exponents near 
either 1.4 (ex-sc), 0.6 (ex-c), or 0.2 (en-c). We observe that 
videos classified as endogenous-critical (continuous red line) 
on the basis of their relaxation exponent, indeed have sig- 
nificantly more precursory growth. One also sees very little 
precursory growth for the two exogenous classes. 



relaxation after the peak, one is not surprised to visual- 
ize faster decays for those videos with exponents near 1.4 
compared with those of 0.6 and 0.2. 



DISCUSSION 

These results provide direct evidence that collective 
human dynamics can be robustly classified by epidemic 
models, and understood as the transformation of the dis- 
tribution of individual waiting times by exogenous and 
endogenous forces. One of the surprising results is that 
the various classes are related by a common value of 9. 
While it is not expected that 9 is universal in human sys- 
tems, it is similar to what has been found in other studies 
[bl ] . This provides a possible measure of the strength of 
interactions in a social network, and will be the subject 
of future work. 

In addition to these results, understanding collective 
human dynamics opens the possibility for a number of 
tantalizing applications. It is natural to suggest a quali- 
tative labeling that is quantitatively consistent with the 
three classes derived from the model: viral, quality, and 
junk videos. Viral videos are those with precursory word- 
of-mouth growth resulting from epidemic-like propaga- 
tion through a social network and correspond to the en- 
dogenous critical class with an exponent (1 — 26*). Quality 
videos are similar to viral videos but experience a sudden 
burst of activity rather than a bottom-up growth. Be- 
cause of the "quality" of their content they subsequently 
trigger an epidemic cascade through the social network. 
These correspond to the exogenous critical class, relax- 
ing with an exponent (1 — 6). Lastly, junk videos are 
those that experience a burst of activity for some rea- 
son (spam, chance, etc) but do not spread through the 
social network. Therefore their activity is determined 
largely by the first-generation of viewers, corresponding 
to the exogenous sub-critical class, and they should relax 
as (1 + 9). While one might argue that these labels are 
inherently subjective, they reflect the objective measure 
contained in the collective response to events and infor- 
mation. This is further supported by the average number 
of total views in each class, which is largest for "viral" 
(33,693 views) and smallest for "junk" (16,524 views) as 
one would expect. 

While the above description applies to videos, one 
could extend this technique to the realm of books fiuflil]. 
movies, and other commercial products, perhaps using 
sales as a proxy for measuring the relaxation of individ- 
ual activity. The proposed method for classifying content 
has the important advantage of robustness as it does not 
rely on qualitative judgment, using information revealed 
by the dynamics of the human activity as the referee. 
More importantly, the method does not rely on the mag- 
nitude of the response because of the scale free nature 
of the relaxation dynamics. This implies that identifi- 
cation of relevance — or lack of relevance — can be made 
for content that has mass-appeal, along with that which 
appeals to more specialized communities. Furthermore, 
this framework could be used to provide a quantitative 
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measure of the effectiveness of marketing campaigns by 
measuring the sales response to an "exogenous" market- 
ing event. 

A tenant of complex systems theory is that many seem- 
ingly disparate and unrelated systems actually share an 
underlying universal behavior. In the digital age, we now 
have access to unprecedented stores of data on human ac- 
tivity. This data is usually almost trivial to acquire — in 
both time and money — when compared with 'traditional' 
measurements. If the complex behavior in social systems 
is shared by other complex systems, then our approach, 
which disentangles the individual response from the col- 
lective, may provide a useful framework for the study of 
their dynamics. 
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the manuscript. 
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